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The paper by Ghosh provides a useful introduc- 
tion to the main ideas underlying objective priors 
and how these ideas might profitably be used by 
frequentist statisticians, both at a theoretical and 
practical level. The aspects likely to be of most inter- 
est to this group of statisticians are those concern- 
ing probability matching, allowing valid frequentist 
procedures to be derived via a formal Bayesian anal- 
ysis. But they should also be interested in priors that 
arise from decision-theoretic considerations, not least 
since the consideration of risk criteria, such as mean 
squared error for estimation or operating character- 
istic function for testing, is ubiquitous in the fre- 
quentist approach. As pointed out by the author, 
at a theoretical level the shrinkage argument, which 
I have also used extensively in the past, provides 
a neat way of deriving frequentist asymptotic re- 
sults. 

My discussion will focus on an examination of the 
main criteria that have been used to obtain objec- 
tive priors and, partly related to this, the extent to 
which the theory and practical application can be 
extended to more complex scenarios. Before launch- 
ing into this I would just like to comment on the 
commonly used term "objective" in the present con- 
text. As soon becomes apparent in this field, there 
is an array of possible criteria available for the de- 
velopment of objective priors, some of which depend 
on a specific choice of parameterization, and there 
may be no unique solution even for a given criterion. 
Thus the choice quickly ceases to be purely objec- 
tive. My own preference is to use the term "nonsub- 
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jective," which indicates that the prior is detached 
from subjective beliefs about parameters but which 
does not impart such a strong sense of broad agree- 
ment as to what the prior should be in any particular 
case. 

1. COMPARISON OF CRITERIA 

First, a general point about alternative criteria for 
the development of objective priors. I have a strong 
preference for criteria that would lead to the use of 
properly calibrated subjective priors whenever they 
are available, so that the consideration of objective 
priors in some sense generalizes a property of a fully 
subjective Bayesian approach. In a sense this is true 
of probability matching since this leads to (approx- 
imately) correct coverage of posterior regions in hy- 
pothetical repeated sampling. This in turn implies 
that these regions will also be calibrated over re- 
peated use, as would automatically be the case if 
a properly elicited subjective prior were to be used. 
The same cannot be said for moment matching in 
the sense described in Section 5.2; there seems noth- 
ing in this criterion that would lead one to use a sub- 
jective prior when available. 

Similarly, consideration of a proper scoring rule 
in a decision-theoretic approach would indicate the 
use of an elicited subjective prior whenever one is 
available. As a consequence, I would be uneasy us- 
ing a decision-theoretic criterion that was not based 
on a proper scoring rule. For example, it does seem 
surprising that, even in the scalar parameter case, 
Jeffreys' prior turns out not to be optimal under 
the distance measure (3.13) with /3 = — I. The prob- 
lem is that, unlike the Bernardo criterion that arises 
when /3 = (see later) , none of these distance mea- 
sures corresponds to an average regret based on some 
primitive loss function that produces a (negative) 
score when data x are observed and a prior predic- 
tive distribution 7r(x) is adopted. So there seems to 
be no obvious sense in which we would recover a sub- 
jective prior distribution whenever one is available. 
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Although there is some reference to predictive pro- 
bability matching in Sections 5 and 6, the paper 
is largely a review of objective priors obtained via 
parametric criteria, which usually require a focus on 
one or more specified parameters of interest. This 
has certainly been the most popular area of study 
and, technical device for obtaining frequentist 
procedures, it performs a useful function. However, 
the focus on parameters is a cause for concern for 
many Bayesian statisticians. Such approaches nor- 
mally require a specific choice of parameters of in- 
terest, such as in quantile probability matching or 
the construction of group reference priors. The idea 
that an analysis should be redone when the spotlight 
turns to alternative sets of parameters is disturb- 
ing. In particular, in complex real- world applications 
there will potentially be many parametric functions 
of interest. An alternative to quantile matching is 
higher-order matching for highest posterior density 
or other regions, which may not require a specific 
choice of interest parameters. However, there is an 
infinite variety of ways in which a region can be 
chosen. Indeed, in the scalar parameter case, given 
any prior it is possible to choose the region in such 
a way that higher-order matching is achieved (Sev- 
erini, 1993; Sweeting, 1999). 

An alternative approach is to study the behav- 
ior of predictive distributions. This is appealing as 
the parameterization then becomes irrelevant. Just 
as in the parametric case one can consider predic- 
tive probability matching (Datta, Ghosh and Muk- 
erjee, 2000; Severini, Mukerjee and Ghosh, 2002) 
and predictive risk (Komaki, 1996; Sweeting, Datta 
and Ghosh, 2006), and Ghosh has contributed to 
both of these areas. In the former case the crite- 
rion (4.23) is replaced by the following. Let Y be 
a future observation from the model and let y{7r,a) 
denote the (1 — a)-quantile of the predictive distri- 
bution of Y based on the prior vr. If it is also the 
case that 

pr{Y >y{7r,a)\e} = a + 0{n~'-), 

then we have predictive probability matching; typi- 
cally r will be 2 here. In the latter case we can con- 
sider the regret when the prior vr is adopted and is 
the true parameter value. Adopting the logarithmic 
scoring rule — log7r(2/|x), which is the unique local 
proper scoring rule, this has the general form 



Priors that attempt to control this risk might be 
considered to be more 'general purpose' than priors 
that require the specification of certain parametric 
functions. 

Having used a sensible broad criterion to obtain 
a prior, one could then go on to investigate its para- 
metric properties. For example, there may be more 
than one prior that produces the same (low) predic- 
tive risk and the choice between these priors might 
be made on the basis of a particular interest param- 
eterization. In Examples 1 and 2 of the paper the 
right Haar prior 7r(;U, a) oc is exactly predictive 
probability matching and also arises as a minimax 
prior under (1) (Liang and Barron, 2004). We can 
then see that, for example, it is exactly probability 
matching when the interest parameter is ;U or cj and 
second-order probability matching when 9 = fi/a is 
the interest parameter, as shown in Example 2 (con- 
tinued) . 

It is instructive to compare the above predictive 
risk criterion with the basic reference prior approach 
of Bernardo (1979, 2005). The reference prior crite- 
rion in Section 3.1 is maximization of the Kullback- 
Leibler divergence between the prior and posterior 
distributions. As shown by Clarke and Barron (1994), 
this is equivalent to finding the minimax solution 
under the regret 



(2) 



dx{9, 7r)=E<' 



log 



ii) 



dy|x(0,vr) = E« 



Note that (2) is based on the proper scoring rule 
— log7r(a;). This may be contrasted with (1), which 
is based on the proper scoring rule — log7r(?/|a;), as 
suggested by Geisser in his discussion of Bernardo 
(1979). The former is based on scoring the prior pre- 
dictive distribution, which is arguably less relevant 
than the posterior predictive distribution on which 
the latter is based. We are not so much interested 
in predicting the data already observed as new data 
yet to be observed. This distinction is reminiscent of 
model fitting, where it is the fit to as yet unobserved 
data that is more relevant than the fit to observed 
data. Note also that working in terms of the pos- 
terior predictive distribution avoids problems of im- 
propriety of the prior, requiring only that tt(x) < oo. 
Thus, to continue the discussion of Example 1 in 
the paper, in contrast to the predictive criterion (1), 
Jeffreys' prior emerges as the minimax solution un- 
der (2), whereas it is inadmissible under (1). 

In more complex examples (1) involves a compli- 
cated function that includes components of skewness 
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and curvature of the model. However, it is argued in 
Sweeting, Datta and Ghosh (2006) that it is more 
appropriate to consider the regret 

T{Y\x)y 



(3) 



dY\x{-r,TT) =E 



log 



7riY\X)i 



where the expectation is taken over the joint distri- 
bution of X and Y under the prior r. This is because 
we are not so much interested in comparing the per- 
formance of vr with that in a lower-dimensional sub- 
model at a fixed parameter value as comparing its 
performance with that of other nondegenerate prior 
distributions for the current model. Moreover, when 
an elicited prior r is available criterion (3) will lead 
us to use this prior. An asymptotic analysis of (3) 
and the adoption of a minimax criterion, for exam- 
ple, produces sensible priors in specific examples. 
Another appealing aspect is that the asymptotic 
predictive criterion does not depend on the amount 
of prediction. 

2. MORE COMPLEX MODELS 

Some of the most important and challenging ap- 
plications of the day, such as environmental science, 
biomedicine, neuroscience and genomics, demand lar- 
ge, sophisticated and often high-dimensional mod- 
els. The results in Section 4 of the paper on first- 
and second-order matching priors are mathemati- 
cally attractive, but there is clearly a need to ex- 
plore the extent to which these results can be prof- 
itably used in more complex models. As the author 
points out in Section 6, objective priors have been 
successfully developed for a number of more com- 
plex problems. However, there remains a need for 
semi-automated procedures so that suitable "safe" 
default priors can be developed rapidly for arbitrary 
model structures. Major difficulties include the dif- 
ficulty or impossibility of obtaining a closed form 
expression for Fisher's information and, even if this 
is possible, of solving the required partial differen- 
tial equations. Levine and Casella (2003) proposed 
an algorithm for the implementation of probability 
matching priors for a single interest parameter in the 
presence of a single nuisance parameter. However, 
the implementation requires a substantial amount 
of computing time. An alternative approach is out- 
lined in Sweeting (2005), where it is shown that suit- 
able data-dependent priors can be developed in some 
cases. Staicu and Reid (2008) proposed an elegant 
analytic solution based on higher-order approxima- 
tion of the marginal posterior distribution. It seems 



to me, however, that some form of data-driven ap- 
proach will be the only viable way to extend proba- 
bility matching ideas to general frameworks. 

Apart from computational difficulties, the major 
theoretical difficulty of all the approaches to ob- 
jective prior construction that rely on sample size 
asymptotics is the potential breakdown of the the- 
ory in high-dimensional parameter spaces. In some 
cases it may be possible to identify directions in the 
parameter space about which the data are relatively 
uninformative. This can be conveniently explored, 
for example, via an eigenanalysis of the observed 
information matrix. Although the model is high- 
dimensional, most of the variation of the likelihood 
may take place on a lower-dimensional manifold of 
the parameter space. This means, of course, that 
the model is close to being non-identifiable, which 
causes difficulties if the parameters themselves are 
of direct interest. However, this may be amenable to 
analysis using a predictive approach. If a parameter 
only enters weakly in the model, then the predic- 
tive distribution should not depend critically on the 
prior chosen for that parameter and asymptotic the- 
ory should apply in such cases. 

Although versions of probability matching priors 
and reference priors in nonregular cases have been 
investigated by Ghosal (1997, 1999) and Berger, Ber- 
nardo and Sun (2009), it will be a major challenge 
to develop multidimensional priors in an automatic 
way when some aspects of the model are regular and 
others nonregular. 

I suspect that the application of objective priors 
for high-dimensional problems will be of greater in- 
terest to Bayesian than to frequentist statisticians. 
Given the difficulties of deriving such priors in these 
cases, the frequentist may well abandon this route 
and explore alternative simulation-based approaches. 
On the other hand, a suitable high-dimensional prior 
is essential for the Bayesian statistician to operate 
at all. Yet the greater the dimension of the model 
the less likely it is that reliable prior information 
will be available on all the parameters, let alone on 
their mutual dependencies. Furthermore, as noted 
earlier, it is less likely that there will be just one 
or two parameters of interest, so I believe that the 
quest will focus more on the identification of safe, 
general purpose priors that allow the inclusion of 
subjective information when available, rather than 
on priors tailored to specific parameters. If this am- 
bition is realized, then the resulting priors should 
be thought of as no more than "reference" priors, in 
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the broad sense of the word, and should not replace 
the need for sensitivity analysis. 

3. SOME OTHER DIFFICULTIES 

Many Bayesian statisticians remain sceptical about 
the need for objective priors to represent ignorance 
and a common practice is to utilize proper but dif- 
fuse priors instead. However, care has to be taken 
that the tail behavior of such priors is not too thin, 
otherwise the prior may have the unexpected ef- 
fect of dominating the likelihood. Consider a ran- 
dom sample from N{fi,a'^). Suppose that fi and cj^ 
are taken to be a priori independent with normal 
and inverse Gamma distributions, respectively. How 
diffuse should these distributions be and how sen- 
sitive are the results to these choices? Specifically, 
suppose that Xi ~ A^(//, (/>~^), where (p is the pre- 
cision parameter, and fi, (f) are a priori independent 
with /i ~ A^(0, c~^), ~ Gamma(a, 6). Suppose we 
observe data 529.0,530.0,532.0,533.1,533.4,533.6, 
533.7,534.1,534.8,535.3. Take a = b = c = e. What 
is the effect of the choice of e? The value c = 0.001 is 
not small enough: the "noninformative prior" dom- 
inates the likelihood and the mean of the marginal 
posterior of is close to zero. Effectively, this hap- 
pens because the normal tail of the prior for // is 
thinner than the Student i-tail of the integrated like- 
lihood of /i. The value c = 0.0002 is also not suffi- 
ciently small, although if a Gibbs sampler starting 
near the sample values is run, then it will not detect 
the problem at all until after a large number of ite- 
rations and it will appear from trace plots as if the 
sampler has converged. A value of c less than 0.0001 
is needed for the likelihood to dominate the prior. 
If we run into such problems in simple models like 
this, then there has to be a great deal of concern 
for higher-dimensional models. So objective priors 
do matter; it is virtually impossible to reliably elicit 
a high-dimensional prior distribution and there are 
pitfalls associated with using vague but proper pri- 
ors. 

Yet another difficulty arises when the likelihood 
does not tend to zero at the boundary of the param- 
eter space. In that case an improper prior may lead 
to an improper posterior, forcing the use of a proper 
prior. The objective selection of such a prior is likely 
to be problematic. An example is the dispersion pa- 
rameter in a Dirichlet process mixture model. Some 
authors simply set the hyperparameters in a Gamma 
prior to be very small, but clearly this requires great 
care as we know that in the limit we will obtain an 
improper posterior. 



4. CONCLUDING REMARKS 

I do think that frequentist interest in Bayesian 
statistics should be rather more than simply its po- 
tential use as a device to obtain valid frequentist 
procedures. When there is some concern about the 
priors adopted, Bayesians will often "look over their 
shoulder" at frequentist properties, if only to check 
that the prior is not producing some anomalous be- 
havior (cf. Example 3 in the paper). Likewise, fre- 
quentist statisticians should find it useful to do the 
same, possibly to provide an indication that they are 
not falling seriously foul of the conditionality prin- 
ciple, or possibly to see to what extent their confi- 
dence statements have direct probability interpreta- 
tions. Finally, I would like to thank the author for 
his interesting review of this area and for stimulat- 
ing me to think a little more about the basis for the 
construction of objective priors and the challenges 
that confront this field of research. 
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